AITopics | norm regularization

Why neural networks find simple solutions: The many regularizers of geometric complexity

Neural Information Processing SystemsDec-23-2025, 18:51:15 GMT

In many contexts, simpler models are preferable to more complex models and the control of this model complexity is the goal for many methods in machine learning such as regularization, hyperparameter tuning and architecture design. In deep learning, it has been difficult to understand the underlying mechanisms of complexity control, since many traditional measures are not naturally suitable for deep neural networks. Here we develop the notion of geometric complexity, which is a measure of the variability of the model function, computed using a discrete Dirichlet energy. Using a combination of theoretical arguments and empirical results, we show that many common training heuristics such as parameter norm regularization, spectral norm regularization, flatness regularization, implicit gradient regularization, noise regularization and the choice of parameter initialization all act to control geometric complexity, providing a unifying framework in which to characterize the behavior of deep learning models.

geometric complexity, neural network find simple solution, regularization, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unified View of Matrix Completion under General Structural Constraints

Suriya Gunasekar, Arindam Banerjee, Joydeep Ghosh

Neural Information Processing SystemsOct-2-2025, 08:36:00 GMT

Matrix completion problems have been widely studied under special low dimensional structures such as low rank or structure induced by decomposable norms. In this paper, we present a unified analysis of matrix completion under general low-dimensional structural constraints induced by any norm regularization. We consider two estimators for the general problem of structured matrix completion, and provide unified upper bounds on the sample complexity and the estimation error. Our analysis relies on generic chaining, and we establish two intermediate results of independent interest: (a) in characterizing the size or complexity of low dimensional subsets in high dimensional ambient space, a certain partial complexity measure encountered in the analysis of matrix completion problems is characterized in terms of a well understood complexity measure of Gaussian widths, and (b) it is shown that a form of restricted strong convexity holds for matrix completion problems under general norm regularization. Further, we provide several non-trivial examples of structures included in our framework, notably including the recently proposed spectral k -support norm.

application, matrix completion, norm regularization, (11 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Estimation with Norm Regularization

Neural Information Processing SystemsSep-30-2025, 10:48:25 GMT

Analysis of estimation error and associated structured statistical recovery based on norm regularized regression, e.g., Lasso, needs to consider four aspects: the norm, the loss function, the design matrix, and the noise vector. This paper presents generalizations of such estimation error analysis on all four aspects, compared to the existing literature. We characterize the restricted error set, establish relations between error sets for the constrained and regularized problems, and present an estimation error bound applicable to {\em any} norm. Precise characterizations of the bound are presented for a variety of noise vectors, design matrices, including sub-Gaussian, anisotropic, and dependent samples, and loss functions, including least squares and generalized linear models. Gaussian widths, as a measure of size of suitable sets, and associated tools play a key role in our generalized analysis.

estimation, name change, norm regularization, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

7 Appendix

Neural Information Processing SystemsAug-15-2025, 17:27:50 GMT

Additional experimental results are presented from Section 7.12 on. In Section 5.2 "Alignment of adversarial perturbations with singular vectors", we have seen that As we have seen in Section 4.2, it is the dominant singular vector corresponding to the largest singular value that determines the optimal adversarial perturbation to the Jacobian and hence the maximal amount of signal-gain that can be induced when propagating an The gradient of the loss w.r.t. the logits of the classifier takes The derivation goes as follows. The rest is clever notation. By Hölder's inequality, we have for non-zero z, v See comment after Equation 1.1 in [21]. " 1 (40) where we have used that ( p Moreover, if v is of the form v " sign p z qd| z | Thus, the numerator (and hence the direction) remains the same.

adversarial training, perturbation, regularization, (12 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

ab7314887865c4265e896c6e209d1cd6-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 17:27:43 GMT

adversarial example, arxiv preprint arxiv, regularization, (13 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Data-Driven and Theory-Guided Pseudo-Spectral Seismic Imaging Using Deep Neural Network Architectures

Zerafa, Christopher

arXiv.org Artificial IntelligenceFeb-26-2025

Full Waveform Inversion (FWI) reconstructs high-resolution subsurface models via multi-variate optimization but faces challenges with solver selection and data availability. Deep Learning (DL) offers a promising alternative, bridging data-driven and physics-based methods. While FWI in DL has been explored in the time domain, the pseudo-spectral approach remains underutilized, despite its success in classical FWI. This thesis integrates pseudo-spectral FWI into DL, formulating both data-driven and theory-guided approaches using Deep Neural Networks (DNNs) and Recurrent Neural Networks (RNNs). These methods were theoretically derived, tested on synthetic and Marmousi datasets, and compared with deterministic and time-domain approaches. Results show that data-driven pseudo-spectral DNNs outperform classical FWI in deeper and over-thrust regions due to their global approximation capability. Theory-guided RNNs yield greater accuracy, with lower error and better fault identification. While DNNs excel in velocity contrast recovery, RNNs provide superior edge definition and stability in shallow and deep sections. Beyond enhancing FWI performance, this research identifies broader applications of DL-based inversion and outlines future directions for these frameworks.

artificial intelligence, machine learning, quantitative assessment, (17 more...)

arXiv.org Artificial Intelligence

2502.18852

Country:

Europe > Germany (0.27)
North America > United States > California (0.27)
Atlantic Ocean (0.14)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Energy > Oil & Gas > Upstream (1.00)
Information Technology (0.67)
Materials (0.67)
Health & Medicine > Diagnostic Medicine > Imaging (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Why neural networks find simple solutions: The many regularizers of geometric complexity

Neural Information Processing SystemsOct-9-2024, 17:44:10 GMT

In many contexts, simpler models are preferable to more complex models and the control of this model complexity is the goal for many methods in machine learning such as regularization, hyperparameter tuning and architecture design. In deep learning, it has been difficult to understand the underlying mechanisms of complexity control, since many traditional measures are not naturally suitable for deep neural networks. Here we develop the notion of geometric complexity, which is a measure of the variability of the model function, computed using a discrete Dirichlet energy. Using a combination of theoretical arguments and empirical results, we show that many common training heuristics such as parameter norm regularization, spectral norm regularization, flatness regularization, implicit gradient regularization, noise regularization and the choice of parameter initialization all act to control geometric complexity, providing a unifying framework in which to characterize the behavior of deep learning models.

geometric complexity, neural network find simple solution, regularization, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reviews: Learning Low-Dimensional Metrics

Neural Information Processing SystemsOct-8-2024, 12:48:36 GMT

Summary of the paper: The paper considers the problem of learning a low-rank / sparse and low-rank Mahalanobis distance under relative constraints dist(x_i,x_j) dist(x_i,x_k) in the framework of regularized empirical risk minimization using trace norm / l_{1,2} norm regularization. The contributions are three theorems that provide 1) an upper bound on the estimation error of the empirical risk minimizer 2) a minimax lower bound on the error under the log loss function associated to the model, showing near-optimality of empirical risk minimization in this case 3) an upper bound on the deviation of the learned Mahalanobis distance from the true one (in terms of Frobenius norm) under the log loss function associated to the model. Quality: My main concern here is the close resemblance to Jain et al. (2016). Big parts of the present paper have a one-to-one correspondence to parts in that paper. Jain et al. study the problem of learning a Gram matrix G given relative constraints dist(x_i,x_j) dist(x_i,x_k), but without being given the coordinates of the points x_i (ordinal embedding problem).

empirical risk minimization, learning low-dimensional metric, theorem 2, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.73)

Add feedback

Nuclear Norm Regularization for Deep Learning

Scarvelis, Christopher, Solomon, Justin

arXiv.org Machine LearningMay-23-2024

Penalizing the nuclear norm of a function's Jacobian encourages it to locally behave like a low-rank linear map. Such functions vary locally along only a handful of directions, making the Jacobian nuclear norm a natural regularizer for machine learning problems. However, this regularizer is intractable for high-dimensional problems, as it requires computing a large Jacobian matrix and taking its singular value decomposition. We show how to efficiently penalize the Jacobian nuclear norm using techniques tailor-made for deep learning. We prove that for functions parametrized as compositions $f = g \circ h$, one may equivalently penalize the average squared Frobenius norm of $Jg$ and $Jh$. We then propose a denoising-style approximation that avoids the Jacobian computations altogether. Our method is simple, efficient, and accurate, enabling Jacobian nuclear norm regularization to scale to high-dimensional deep learning problems. We complement our theory with an empirical study of our regularizer's performance and investigate applications to denoising and representation learning.

denoiser, matrix, singular value, (16 more...)

arXiv.org Machine Learning

2405.14544

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.50)

Industry: Education > Focused Education > Special Education (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sparse Prediction with the k-Support Norm

Neural Information Processing SystemsMar-14-2024, 13:47:33 GMT

We show that this new k-support norm provides a tighter relaxation than the elastic net and can thus be advantageous in in sparse prediction problems. We also bound the looseness of the elastic net, thus shedding new light on it and providing justification for its use.

elastic net, k-support norm, regularization, (15 more...)

Neural Information Processing Systems

Country: